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(57) Abstract 



A video encoding method and apparatus for adapting a video input to a bandwidth of a transmission channel of a network that 
includes determining the number N enhancement layer bitstreams capable of being adapted to the bandwidth of the transmission channel 
of a network. A base layer bitstream is encoded from the video input wherein a plurality of enhancement layer bitstreams are encoded 
from the video input. The enhancement layer bitstreams are based on the base layer bitstream, wherein the plurality of enhancement layer 
bitstreams complements the base layer bitstream and the base layer bitstream and N enhancement layer bitstreams are transmitted to the 
network. 
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SCALABLE VIDEO CODING AND DECODING 

BACKGROUND OF THF I NVENTION 

Ffellti Qf the Invention 

The present invention relates to a method and apparatus for the scaling of data 
signals the bandwidth of the transmission channel; and more particularly to a scalable 
video method and apparatus for coding video such that the received video is adapted 
to the bandwidth of the transmission channel. 

Description of Related Art 



Signal compression in the video arena has long been employed to increase the 
bandwidth of either the generating, transmitting, or receiving device. MPEG - an 
acronym for Moving Picture Experts Group - refers to the family of digital video 
compression standards and file formats developed by the group. For instance, the 
MPEG-1 video sequence is an ordered stream of bits, with special bit patterns marking 
the beginning and ending of a logical section. 

MPEG achieves high compression rate by storing only the changes from one 
frame to another, instead of each entire frame. The video information is then encoded 
using a technique called DCT (Discrete Cosine Transform) which is a technique for 
representing a waveform data as a weighted sum of cosines. MPEG use a type of 
lossy compression wherein some data is removed. But the diminishment of data is 
generally imperceptible to the human eye. It should be noted that the DCT itself does 
not lose data; rather, data compression technologies that rely on DCT approximate 
some of the coefficients to reduce the amount of data. 

The basic idea behind MPEG video compression is to remove spatial 
redundancy within a video frame and temporal redundancy between video frames. The 
DCT-based (Discrete Cosine Transform) compression is used to reduce spatial 
redundancy and motion compensation is used to exploit temporal redundancy. The 
images in a video stream usually do not change much within small time intervals. 
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Thus, the idea of motion-compensation is to encode a video frame based on other 
video frames temporally close to it. 

A video stream is a sequence of video frames, each frame being a still image. 
A video player displays one frame after another, usually at a rate close to 30 frames per 
second. Macroblocks are formed, each macroblock consists of four 8x8 luminance 
blocks and two 8 x 8 chrominance blocks. Macroblocks are the units for 
motion-compensated compression, wherein blocks are basic unit used for DCT 
compression. Frames can be encoded in three types: intra-frames (I-frames), forward 
predicted frames (P-frames), and bi-directional predicted frames (B-frames). 

An I-frame is encoded as a single image, with no reference to any past or nature 
frames. Each 8x8 block is encoded independently, except that the coefficient in the 
upper left comer of the block, called the DC coefficient, is encoded relative to the DC 
coefficient of the previous block. The block is first transformed from the spatial 
domain into a frequency domain using the DCT (Discrete Cosine Transform), which 
separates the signal into independent frequency bands. Most frequency information is 
in the upper left corner of the resulting 8x8 block. After the DCT coefficients are 
produced the data is quantized, i.e. divided or separated. Quantization can be thought 
of as ignoring lower-order bits and is the only lossy part of the whole compression 
process other than sub-sampling. 

The resulting data is then run-length encoded in a zig-zag ordering to optimize 
compression. The zig-zag ordering produces longer runs of 0's by taking advantage of 
the fact that there should be little high-frequency information (more 0's as one zig-zags 
from the upper left comer towards the lower right comer of the 8x8 block). 

A P-frame is encoded relative to the past reference frame. A reference frame is 
a P- or I-frame. The past reference frame is the closest preceding reference frame. A 
P-macroblock is encoded as a 16 x 16 area of the past reference frame, plus an error 
tenn. 

Tospecifythe 16 x 16 area of the reference frame, amotion vector is included. 
A motion vector (0, 0) means that the 16 x 16 area is in the same position as the 
macroblock we are encoding. Other motion vectors are generated are relative to that 
position. Motion vectors may include half-pixel values, in which case pixels are 
averaged. The error term is encoded using the DCT, quantization, and run-length 
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encoding. A macroblock may also be skipped which is equivalent to a (0, 0) vector 
and an all-zero error term. 

A B-frame is encoded relative to the past reference frame, the future reference 
frame, or both frames. 

A pictorial view of the above processes and techniques in application are 
depicted in prior art Fig. 15, which illustrates the decoding process for a SNR 
scalability. Scalable video coding means coding video in such a way that the quality of 
a received video is adapted to the bandwidth of the transmission channel. Such a 
coding technique is very desirable for transmitting video over a network with a time- 
varying bandwidth. 

SNR scalability defines a mechanism to refine the DCT coefficients encoded in 
another (lower) layer of a scalable hierarchy. As illustrated in prior art Fig. 15, data 
from two bitstreams is combined after the inverse quantization processes by adding 
the DCT coefficients, Until the dat is combined, the decoding processes of the two 
layers are independent of each other. 

The lower layer (base layer) is derived from the first bitstream and can itself be 
either non-scalable, or require the spatial or temporal scalability decoding process, and 
hence the decoding of additional bitstream, to be applied. The enhancement layer, 
derived from the second bitstream, contains mainly coded DCT coefficients and a small 
overhead. 

In the current MPEG-2 video coding standard, there is an SNR scalability 
extension that allows two levels of scalability. MPEG achieves high compression rate 
by storing only the changes from one frame to another, instead of each entire frame. 
There are at least two disadvantages of employing the MPEG-2 standard for encoding 
video data. One disadvantage is that the scalability granularity is not fine enough, 
because the MPEG-2 process is an all or none method. Either the receiving device can 
receive all of the data from the base layer and the enhancement layer or only the data 
from the base layer bitstream. Therefore, the granularity is not scalable. In a network 
environment, more than two levels of scalability are usually needed. 

Another disadvantage is that the enhancement layer coding in MPEG-2 is not 
efficient. Too many bits are needed in the enhancement layer in order to have a 
noticeable increase in video quality. 
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The present invention overcomes these disadvantages and others by providing 
among other advantages, an efficient scalable video coding method with increased 
granularity. 
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SUMMARY OF THF. tnvftmt^ 

The present invention can be characterized as a scalable video coding means 
and a system for encoding video data, such that quality of the final image is gradually 
unproved as more bits are received. The improved quality and scalability are achieved 
by a method wherein an enhancement layer is subdivided into layers or levels of 
bitstream layers. Each bitstream layer is capable of carrying information 
complementary to the base layer information, in that as each of the enhancement layer 
bitstreams are added to the corresponding base layer bitstreams the quality of the 
resulting images are improved. 

He number N of enhaneemeut layers is determined or limited by tbe network 
that provides me transmission channel to the destination point. While the base layer 
bttstieam is always transmitted « me destination point, the same is not necessarily ^ 
for me enhancement tavern. Each .ayer is given a priority coding and ttanamission is 
effectuated accotding «, the priority coding, m me even, that a.1 of the enhancement 
layer, cannot be transmitted the lower priority coded layers will be omitted Tie 
omtsston of one or more enhancement layers may be due to a multitude of reasons 

For tnstance, the server which provides the transmission channel to the 
destination point may be experiencing large demand on its resouroea fern, other users 
m order to try and accommodate all of ha users the server wiU prioritize the date and ' 
only transmit th. higher priority coded nackete of information. The transmission 
channel may be the limiting factor because of the bandwidth of the chamtel i e 
Interne, access port, Ethernet protocol, LAN, WAN, twisted pair cable, co-axia. cable 
etc. or me destination device itself, i.e. modem, absent* of at, enhanced video cam ' 
etc. may no, be able ,o receive me additions bandwidti, made available to i, m these 
utstencea only M number (M is an integer number - 0, , , 2, . . .) of enhancemen, layers 
may be received, wherein N number (Nia an integer number = 0,1, 2, )of 
enhancemen, layers were generated at the encoding stage, M < N. 
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To achieve these and other advantages and in accordance with the purpose of 
the present invention, as embodied and broadly described, the scalable video method 
and apparatus according to one aspect of the invention includes a video encoding 
method for adapting a video input to a bandwidth of a transmission channel of a 
network, the method includes determining the number N of enhancement layer 
bitstreams capable of being adapted to the bandwidth of the transmission channel of 
the network. Encoding a base layer bitstream from the video input is then performed 
and encoding N number of enhancement layer bitstreams from the video input based o 
the base layer bitstream, wherein the plurality of enhancement layer bitstreams 
complements the base layer bitstream. The base layer bitstream and the N 
enhancement layer bitstreams are then provided to the network. 

According to another aspect of the present invention, a video decoding method 
for adapting a video input to a bandwidth of a transmission channel of a network 
includes, determining number M of enhancement layer bitstreams of said video input 
capable of being received from said transmission channel of said network. Decoding a 
base layer bitstream from received video input and decoding M number of 
enhancement layer bitstreams from the received video input based on the base layer 
bitstream, wherein the M received enhancement layer bitstreams complements the base 
layer bitstream. Then reconstructing the base layer bitstream and N enhancement layer 
bitstreams. 

According to still another aspect of the present invention, a video decoding 
method for adapting a video input to a bandwidth of a receiving apparatus, the method 
includes demultiplexing a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network, decoding the base layer 
bitstream, decoding at least one of the plurality of enhancement layer bitstreams based 
on generated base layer bitstream, wherein the at least one of the plurality of 
enhancement layer bitstreams enhances the base layer bitstream. Then reconstructing a 
video output. 

According to a further aspect of the present invention, a video encoding 
method for encoding enhancement layers based on a base layer bitstream encoded from 
a video input, the video encoding method includes, taking a difference between an 
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original DCT coefficient and a reference point and dividing the difference between the 
original DCT coefficient and the reference point into N bit-planes. 

According to a still further aspect of the present invention, a method of coding 
motion vectors of a plurality of macroblocks, includes determining an average motion 
5 vector from N motion vectors for N macroblocks, utilizing the determined average 
motion vector as the motion vector for the N macroblocks, and encoding 1/N motion 
vectors in a base layer bitstream. 

Additional features and advantages of the invention will be set forth in the 
description which follows, and in part will be apparent from the description, or may be 
10 learned bypractice of the invention. The aspects and other advantages of the invention 

will be realized and attained by the structure particularly pointed out in the written 
description and claims hereof as well as the appended drawings. 

It is to be understood that both the foregoing general description and the 
following detailed description are exemplary and explanatory and are intended to 
15 provide further explanation of the invention as claimed. 

PRIKF DESCRIPTION OF twit 

The accompanying drawings, which are included to provide a further 
understanding of the invention and are incorporated in and constitute a part of this 
20 specffication, illustrate embodiments of the invention and together with the description 

serve to explain the principles of the invention. In the drawings: 

Fig. 1 illustrates a flow diagram of the scalable video encoding method of the 
present invention; 

Fig. 2A illustrates conventional probability distribution of DCT coefficient 

25 values; 

Fig. 2B illustrates conventional probability distribution of DCT coefficient 
residues; 

Fig. 3A illustrates the probability distribution of DCT coefficient values of the 
present invention; 



30 



Fig. 3B illustrates the probability distribution of DCT coefficient residues of the 
present invention; 
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Figs. 3C and 3D illustrates a method for taking a difference of a DCT 
coefficient of the present invention; 

Fig. 5 illustrates a flow diagram for finding the maximum number of bit-planes 
in the DCT differences of a frame of the present invention; 

Fig. 6 illustrates a flow diagram for generating (RUN, EOP) Symbols of the 
present invention; 

Fig. 7 Illustrates a flow diagram for encoding enhancement layers of the 
present invention; 

Fig. 8 illustrates a flow diagram for encoding (RUN, EOP) symbols and 
sign_enh values of one DCT block of one bit-plane; 

Fig. 9 illustrates a flow diagram for encoding a sign_enh value of the present 
invention; 

Fig. 1 0 illustrates a flow diagram for adding enhancement difference to a DCT 
coefficient of the present invention; 

Fig. 1 1 illustrates a flow diagram for converting enhancement difference to a 
DCT coefficient of the present invention; 

Fig. 12 illustrates a flow diagram for decoding enhancement layers of the 
present invention; 

Fig. 13 illustrates a flow diagram for decoding (RUN, EOP) symbols and 
sign_enh values of one DCT block of one bit-plane; 

Fig. 14 illustrates a flow diagram for decoding a sign_enh value; and 
Fig. 15 illustrates a prior a conventional SNR scalability flow diagram. 

DETAILED DESCRIPTION OF THE PREFFRRED EMBOmMFNT S 

Reference will now be made in detail to the preferred embodiments of the 
present invention, examples of which are illustrated in the accompanying drawings. 

Fig. 1 illustrates the scalable video diagram 10 of an embodiment of the present 
invention. The original video input 20 is encoded by the base layer encoder 30 in 
accordance with the method of represent by flow diagram 400 of Fig. 4. A DCT 
coefficient OC and its corresponding base layer quantized DCT coefficient QC are 
generated and a difference determined pursuant to steps 420 and 430 of Fig. 4. The 
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" ta ""™ fa ft ° m *• ^ ««— 30 is passed to one enhance " 
layer encoder 40 am erodes the enhancement information 

^^togofmeenhaneement.ayereneoderisperformedpur.nan.to 
methods 500 - 900 as depicted in Figs. 5 - 10, respectively and *U1 be briefly 

^cnbed. The bite^em from the base layer encoder 30 and the N hitstreams fan the 

TTZ l^^ 40 ^^^^ onchann^O 
by at least two methods. 

with dirre r, ftS ' ^ " " m,M,iP,eXed * —W™ 50 

wtth driferen, prmnry .denufiers, e.g., «he hase layer bitsfrean, is guarantt ^ 

«hnnc«nen. bnaream layer , previded by enhancement layer encoder 40 is'given a 
untM^Ul^N^'^i *°^ti enhancement bttstream layer 2. The prioritization is commued 

~atedT M r" n8,ayere3 ^ 

nttermed^ed dev.ces determine the nnmber N of bifcueam layers to be generated 
^"^""i^i^genemrediaamr^onof^^^.. 

Uns hs, .s no, .mended ,o exhausrive bu, only representetion of potent Umiting 
devrces a^or equipmaax mi ^ ^ ^ ^ »£. 

number of bi«s,ream layers M (wherein M is an integer andM<N) reaching Z 
^nauor .point ,00 can be mrmerlimited by no, J. me physical ^ of me 

droppmgofb,tetteam layers acconiing to meir priority. 

In a second method die server 50 kn<™« th. ~, • ■ 

w *" «ransmission channel 60 condftim, 

00 reaves me brtsfream for me base layer and M bi«reams for me enhanced 
layer, where M < N. ^mancement 

The bitstreams M are sent to the base laver on »„a u 
^^-^^deoon^m.OO.rnedecodm.ofmemmSreT 
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bitstreams are accomplished pursuant to the methods and algorithms depicted in flow 
diagrams 1 100 - 1400 of Figs. 11-14, respectively. 

The base layer encoder and decoder are capable of performing logic pursuant 
to the MPEG-1, MPEG-2, or MPEG-4 (Version-!) standards that are hereby 
incorporated by reference into this disclosure. 



Taking Residue with Probability Distribution Preserved 

A detailed description of the probability distribution residue will now be made 
with reference to Figs 2A - 3B 

In the current MPEG-2 signal-to-noise ratio (SNR) scalability extension, a 
residue or difference is taken between the original DCT coefficient and the quantized 
DCT coefficient. Fig. 2A illustrates the distribution of a residual signal as a DCT 
coefficient. In taking the residue small values have higher probabilities and large 
values have smaller probabilities. The intervals along the horizontal axis represent 
quantization bins. The dot in the center of each interval represents the quantized DCT 
coefficient. Taking the residue between the original and the quantized DCT coefficient 
is equivalent to moving the origin to the quantization point. 

Therefore, the probability distribution of the residue becomes that as shown in 
Figure 2B. The residue from the positive side of Fig. 2 A has a higher probability of 
being negative than positive and the residue taken from the negative side of the Fig. 2 A 
has a higher probability of being positive than negative. The result is that the 
probability distribution of the residue becomes almost uniform. Thus making coding 
the residue more difficult. 

A vastly superior method is to generate a difference between the original and 
the lower boundary points of the quantized interval as shown in Fig. 3 A and Fig. 3B. 
In this method, the residue is taken from the positive side of Fig. 2 A remains positive 
and the residue from the negative side of Fig. 2 A remains negative. Taking the residue 
is equivalent to moving the origin to the reference point as illustrated in Fig. 3 A. Thus, 
the probability of the residue becomes as shown in Fig. 3B. This method preserves the 
shape of the original non-uniform distribution. Although the dynamic range of the 
residue taken in such a manner seems to be twice of that depicted in fig. 2B, their is 
no longer a need to code the sign, i.e. - or +, of the residue. The sign of the residue is 
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encoded in the base layer btaream copending the element layer, therefore ' 
tins redundancy is eHmmated and bits representing the sign are thus saved. Therefore 
•here ,s only a need «o code the magnitude to, still has a nonunifonn distiibution. ' 

Bil giant f fftliny of BanUal nrr -~.flj r | rnf1 

A ^^ resid "-<>^''l.eDCTcoeiIic i e nBi „a„ 8x8block)bitplaIle 
codmg ,s used ,„ code the residue. In bi,-p,ane coding memod <he bi,. pl ane coding 
men** considers each residua. OCT coefficient as a bhuny number o(xvcnl „ |ts 
-ead of as a deciraal integer of a certain value as in the run-teve, coding memod. 
lite b«-p,ane coding memod In the present invention only «p, ace s ^el coding 
part. Therefore, all the other syntax elements remain the same. 

An example of and description of the bit-plane coding merhod will now be 
made, wherein 64 residua, DCT coefficients for an mter-bfocx and 63 residual DCT 
coefficient for an Intm-block (excluding fhe htfra-DC componen, ma, is coded using 
a separate metirod) are utilised for me exampfe. The 64 (or 63) residual DCT 
coefficient are ordered info a one-dunensiona. anay and at .east one of the residual 
coeffictents is non-zere. The bi.-p,ane coding mefhod then perfonns me following 
steps. & 

The maximnm value of all the residual DCT coefficients in a frame is 
detemnned and fbe minimum number of bits, N, needed ,o represent me maximum 
va.ue m me bma* fotma, is also determined. N is me number of biftfanea layers for 
this Same and is coded in the frame header. 

Wi « hi —»"x 8 b 1 oo k is re presen,eve^oneofd,e64(or63)residua I DCT 
eoeffictenfs wim N hi* in fhe biomy forma, artd mere is formed N bi,.p,a„es or layera 

residual DCT coefficient at the same significant position. 

the th ^ T SignifiCan, bi, " Pla,K " detemined * « ~> and 
uten tiie number of aU-zero bi.-p.anes between me mos, significan, bi,-p,ane 

detemnnedanddieNmon.iseoded. Then stertmg from me mo^slinffi^^^ 
Wane M symbols are fotmed of two component (a) ^ rf ^ 

O s before a 1 (RUN), (b, whether mere are any „ .eft on dtis bi, pfane, i.e. End-Of 
^^^^^^^ 
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ALL-ZERO is formed to represent an all-zero bit-plane. Note that the MSB plane 
does not have the all-zero case because any all-zero bit-planes before the MSB plane 
have been coded in the previous steps. 

Four 2-D VLC tables are used, wherein the table VT-C-Table-0 corresponds to 
the MSB plane; table VLC-Table- 1 corresponds to the second MSB plane; table VLC- 
Table-2 corresponds to the third MSB plane; and table VLC-Table-3 corresponds to 
the fourth MSB and all the lower bit planes. For the ESCAPE cases, RUN is coded 
with 6 bits, EOP is coded with 1 bit. Escape coding is a method to code very small 
probability events which are not in the coding tables individually. 

An example of the above process will now follow. For illustration purposes, 
we will assume that the residual values after the zigzag ordering are given as follows 
and N = 6: The following representation is thereby produced. 



10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1, 0, ... 0, 0 

The maximum value in this block is found to be 10 and the minimum number of 
bits to represent 10 in the binary format (1010) is 4. Therefore, two all-zero bit-planes 
before the MSB plane are coded with a code for the value 2 and the remaining 4 bit- 
planes are coded using the (RUN, EOP) codes. Writing every value in the binary 
format using 4 bits, the 4 bit-planes are formed as follows: 

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (MSB-plane) 

0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (Second MSB-plane) 

1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,0, 0 (Third MSB-plane) 

0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0 (Fourth MSB-plane or LSB-plane) 

Converting the bits of each bit-plane into (RUN, EOP) symbols results in the 
following: 



(0,1) 
(2,1) 

(0, 0), (1,0), (2,0), (1,0), (0, 0), (2, 1) 



(MSB-plane) 
(Second MSB-plane) 
(Third MSB-plane) 
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(5 ' 0> ' (8> !) (Fourth MSB-plane or LSB-plane) 

Therefore, there are 10 symbols to be coded using the (RUN, EOP) VLC 
tables. Based on their locations in the bit-planes, different VLC tables are used for the 
codtng. The enhancement bitstream using all four bitplanes looks as follows: 
code leading-all-zero(2) 
code msb(0, 1) 
code msb- 1(2,1) 

code-msb-2(0,0), code_n,sb-2(T,0), code-msb-2(2,0), code-msb-20,0), code-tnsb- 
2(0,0), code-msb-2(2, I) code_msb-3(5,0), code_msb-3(8, 1). 

to an alternative embodiment, several enhancement bitstreams may be formed 
from the four bit-planes, in this examp.. fiom me respective seta comprising one or 
more of the four bit-planes. 

Motion Vcrlnr Sharing 

to this alternative embodiment of Ute present invention motion vector sharing is 
capable of beutg utiHzed when me baa. layer bi«ream exceeds a predetermined ste or 
more levels of scalability are needed for the enhancement layer. By lowering the 
number of bits reared for coding the motion vectors in the base layer tire bandwidth 
requirements of me base layer bitstream is reduced, to base layer coding a 
macreblock (Uxl6 pixels &r ^ ^ w ^ ^ ^ 

lummance components) of tire current frame is compared with the previous frame 
wuhin a seareh range. The closes, match in the previous frame is used as a prediction 
of the current nracreblock. The relative disp,acemen, of the prediction to the current 
macroblock, in the horizon*! and vertical directions, is called a motion vector 

Tire difference beriveen the current macreblock and if s prediction is coded 
«*ng the DCT coding. In order for the decoder to reconsmtc, the current 
macroblock, the motion vector has to be coded in the bitsueam. Since mere is a fixed 
number of bits for coding a frame, tire more bits spent on ending tire motion vectors 
resnl^mfewerbiuforcodingtiremotioncompensatoddifference, Therefore his 
desnable to lower tire number of bits for coding the motion vectors and leave more bits 
for codmg tire differences between the current macreblock and its prediction 
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10 



15 



20 



25 



For each set of 2 x 2 motion vectors, the average motion vector can be 
determined and used for the four macroblocks. In order to not change the syntax of 
the base layer coding, four macroblocks are forced to have the identical motion 
vectors. Since only one out four motion vectors is coded in the bitstream, the amount 
of bits spent on motion vector coding is reduced, therefore, there are more bits 
available for coding the differences. The cost for pursuing such a method is that the 
four macroblocks, which share the same motion vector may, not get the best matched 
prediction individually and the motion compensated difference may have a larger 
dynamic range, thus necessitating more bits to code the motion vector. 

For a given fixed bitrate, the savings from coding one out of four motion 
vectors may not compensate the increased number of bits required to code the 
difference with a larger dynamic range. However, for a time varying bitrate, a wider 
dynamic range for the enhancement layer provides more flexibility to achieve the best 
possible usage of the available bandwidth. 

Calling Sign Prts 

In an alternative embodiment of the present invention, if the base layer 
quantized DCT coefficient is non-zero, the corresponding enhancement layer 
difference will have the same sign as the base layer quantized DCT. Therefore, there is 
no need to code the sign bit in the enhancement layer. 

Conversely, if the base layer quantized DCT coefficient is zero and 
corresponding enhancement layer difference is non-zero, a sign bit is placed into 
enhancement layer bitstream immediately after the MSB of the difference. 
An example of the above method will now follow. 

Difference of a DCT block after ordering 

- 10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1, 0, ...0, 0 

Sign indications of the DCT block after ordering 

- 3, 3, 3, 3, 2, 0, 3, 3, 1, 2, 2, 0, 3, 3, 1, 2, ... 2, 3 

- 0: base layer quantized DCT coefficient = 0 and difference >0 

- 1 : base layer quantized DCT coefficient = 0 and difference <0 

- 2: base layer quantized DCT coefficient = 0 and difference =0 
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-3: base layer quantized DCT coefficient = 0. 

In this example, the sign bits associated with values 10, 6, 2 don't need to be 
coded and the sign bits associated with 3, 2, 2, 1 are coded in the following way 
Code(All Zero) 
code (All Zero) 
code(0,l) 
code(2,l) 

code(0,0),code(l,0),code(2,0),0,code(l,0),code(0,0),l,code(2,l),0 
code(5,0),code(8,l),l 

For every DCT difference, there is a sign indication associated with it There 
are four possible cases. In the above coding 0, 1,2, and 3 are used to denote the four 
cases. If the sign indication is 2 or 3, the sign bit does not have to be coded because it 
is e,ther associated with a zero difference or available from the corresponding base 
layer data. If the sign indication is 0 or 1 a sign bit code is required once per difference 
value, t.e. not every bit-plane of the difference value. Therefore, a sign bit is put 
immediately after the most significant bit of the difference. 

Optima) flffffnstnirtion of the nrr r^ ^^ 

In an alternative embodiment of the present invention, even though N 
enhancement bitstream layers or planes may have been generated, only M, wherein M 
< N enhancement layer bits are available for reconstruction of the DCT coefficients 
due to the channel capacity, and other constraints such as congestion among others 
the decoder 80 of Fig. 1 may receive no enhancement difference or only a partial ' 
enhancement difference. In such a case, the optimal reconstruction of the DCT 
coefficients is capable of proceeding along the following method: 

If decoded difference = 0, the reconstruction point is the same as that in base 
layer, otherwise, the reconstructed difference = decoded difference + «/, 
*(l«decoded_bit_plane) and the reconstruction point = reference point + 
reconstructed difference * Q_enh +Q_enh/2. 

In the present embodiment, referring to Figs. 3C and 3D, the optimal 
reconstruction point is not the lower boundary of a quantization bin. The above 
method specifies how to obtain the optimal reconstruction point in cases where the 
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difference is quantized and received partially, i.e. not all of the enhancement layers 
generated are either transmitted or received as shown in Fig. 1 . wherein M < N. 
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What is claimed is: 

1 - A video encoding method for adapting a video input to a bandwidth of a 
transmission channel of a network, the method comprising the steps of: 

determining number N of enhancement layer bitstreams capable of being 

adapted to said bandwidth of said transmission channel of said network; 

encoding a base layer bitstream from said video input; 

encoding N number of enhancement layer bitstreams from said video 

input based on the base layer bitstream, wherein the 

N enhancement layer bitstreams complements the base layer bitstream; and 

providing the base layer bitstream and N enhancement layer bitstreams' to said 
network. 

2. The video encoding method according to claim 1, wherein the 
determining step includes negotiating with intermediate devices on said 
network. 

3. The video encoding method according to claim 2, wherein 
negotiating includes deterniining destination resources. 

4. The video encoding method according to claim 1, wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-1 encoding 
method. 



5. 



The video encoding method according to claim 1, wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-2 encoding 
method. 



The video encoding method according to claim 1, wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-4 encoding 
method. 
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7. The video encoding method according to claim 1 , wherein the step of 
encoding the base layer bitstreams is performed by a Discrete Cosine 
Transform (DCT) method. 

8. The video encoding method according to claim 7, wherein after 
encoding the base layer bitstreams by a Discrete Cosine Transform (DCT) 
method a DCT coefficient is quantized. 



9. The video encoding method according to claim 1 , wherein the enhancement 
layer bitstreams are based on the difference of an original base layer DCT 
coefficient and a corresponding base layer quantized DCT coefficient. 

1 0. The video encoding method according to claim 1 , wherein the base 
layer bitstream and the N enhancement layer provide to the network are 
multiplexed. 



11. A video decoding method for adapting a video input to a bandwidth of a 
transmission channel of a network, the method comprising the steps of: 

determining number M of enhancement layer bitstreams of said video input 

capable of being received from said transmission channel of said 

network; 

decoding a base layer bitstream from received video input; 

decoding M number of enhancement layer bitstreams from the received video 
input based on the base layer bitstream, wherein the M received 
enhancement layer bitstreams complements the base layer bitstream; 

and 

reconstructing the base layer bitstream and N enhancement layer bitstreams. 



12. The video decoding method according to claim 1 1 , wherein the 

determining step includes negotiating with intermediate devices on said 
network. 
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13. The video decoding method according to claim 12, wherein 
negotiating includes determining destination resources. 



14. 



15. 



16. 



17. 



18. 



19. 



20. 



The video decoding method according to claim 1 1, wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-1 decoding 



method. 



The video decoding method according to claim 1 1 , wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-2 decoding 
method. fe 



The video decoding method according to claim 1 1, wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-4 decoding 
method. 

The video decoding method according to claim 1 1, wherein the step of 
decoding the base layer bitstreams is performed by a Discrete Cosine 
Transform (DCT) method. 

The video decoding method according to claim 1 7, wherein after 
decoding the base layer bitstreams by a Discrete Cosine Transform (DCT) 
method a DCT coefficient is unquantized. 

The video decoding method according to claim 11, wherein coding of the 
enhancement layer bitstreams are based on the difference of an original base 
layer DCT coefficient and a corresponding base layer quantized DCT 
coefficient. 



The video decoding method according to claim 1 1 , wherein the base 
layer bitstream and the M enhancement layers to be reconstructed are de- 
multiplexed. 
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21. A video decoding method for adapting a video input to a bandwidth of a 
receiving apparatus, the method comprising the steps of: 

demultiplexing a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network; 
decoding the base layer bitstream; 

decoding at least one of the plurality of enhancement layer bitstreams based 

on generated base layer bitstream, wherein the at least one of the plurality of 
enhancement layer bitstreams enhances the base layer bitstream; and 
reconstructing a video output. 

22. A video encoding method for encoding enhancement layers based on a base 
layer bitstream encoded from a video input, the video encoding method comprising the 
steps of: 

taking a difference between an original DCT coefficient and a reference point; 

and 

dividing the difference between the original DCT coefficient and the reference 
point into N bit-planes. 

23. The video encoding method according to claim 22, wherein RUN and EOP 
symbols represents the N bit-planes of a DCT block. 

24. The video encoding method according to claim 23, wherein the RUN and EOP 
symbols are encoded. 

25. The video encoding method according to claim 24, wherein a sign bit is 
encoded if the DCT difference is equal to zero or the sign of the DCT difference is the 
same as the corresponding base layer bitstream data. 



-19- 



WO 00/05898 V - W V_ 

PCT/US99/16638 

26. A video decoding method for reconstructing DCT coefficients M enhancement 
layers of N enhancement layers have been received, wherein M < N, comprising: 

means for taking a reconstruction difference as a decoded difference and a 

portion of a decoded bit-plane; 

means for taking a reconstruction point as a reference point and a 
reconstructed difference; and 
determining an optimal reconstruction point. 

27. A method of coding motion vectors of a plurality of macroblocks, the method 
comprising the steps of: 

determining an average motion vector from N motion vectors for N 
macroblocks; 

utilizing the determined average motion vector as the motion vector for the N 

macroblocks; and 
encoding 1/N motion vectors in a base layer bitstream. 
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